Purpose: Actionable guidance for optimizing execution trees at the DSPy optimizer level (few-shot compilation, LM switching) and Elysia level (tree structure, context injection, prompt engineering). Concise, grounded snippets only.
LabeledFewShot, k=10), LM selection (base_lm vs complex_lm), fallback behavior_get_successive_actions), prompt templates (DecisionPrompt), context injection (ElysiaChainOfThought), tool gating (is_tool_available), termination logic_get_successive_actions (structure-only)Location: elysia/tree/tree.py
def _get_successive_actions(self, successive_actions: dict, current_options: dict) -> dict:
for branch in current_options:
successive_actions[branch] = {}
if current_options[branch]["options"] != {}:
successive_actions[branch] = self._get_successive_actions(
successive_actions[branch], current_options[branch]["options"]
)
return successive_actions
What it does: Recursively maps the tree shape (branch → sub-branches) and injects it into DecisionPrompt.successive_actions as context.
Caveats for quantitative optimization:
.add_branch(), .add_tool()) to encode domain-specific structure. Example: for aggregation-heavy domains, create a shallow "aggregate" branch with many leaf tools; for exploration-heavy domains, create deeper chains with conditional gating.Domain-specific structural prompting:
# Example: Quantitative finance domain
tree.add_branch(root=True, branch_id="base", instruction="Choose: data retrieval, quantitative analysis, or reporting")
tree.add_branch(branch_id="quant_analysis", from_branch_id="base",
instruction="Select statistical method: correlation, regression, time-series, or risk metrics")
tree.add_tool(branch_id="quant_analysis", tool=CorrelationTool)
tree.add_tool(branch_id="quant_analysis", tool=RegressionTool)
# Result: successive_actions shows {"quant_analysis": {"correlation": {}, "regression": {}}}
# → LLM sees quantitative structure explicitly
DecisionPrompt for explainable, quantitative decisionsLocation: elysia/tree/prompt_templates.py (L5-L144)
Key input fields for customization:
class DecisionPrompt(dspy.Signature):
instruction: str = dspy.InputField(
description="Specific guidance for this decision point that must be followed"
)
tree_count: str = dspy.InputField(
description="Current attempt number as 'X/Y' where X=current, Y=max. Consider ending as X approaches Y."
)
available_actions: list[dict] = dspy.InputField(
description="List of possible actions: {name: {function_name, description, inputs}}"
)
unavailable_actions: list[dict] = dspy.InputField(
description="Actions unavailable now: {name: {function_name, available_at}}"
)
successive_actions: str = dspy.InputField(
description="Actions that stem from current actions (nested dict structure)"
)
previous_errors: list[dict] = dspy.InputField(
description="Errors from previous actions, organized by function_name"
)
Quantitative customization strategies:
Inject scoring/metrics into instruction:
decision_node = DecisionNode(
instruction="""Choose action with highest expected information gain.
Priority: aggregate (cost=1, gain=high) > query (cost=3, gain=medium) > text (cost=0, gain=low).
Current budget: 5 units."""
)
Encode tool metadata in available_actions descriptions:
tool.description = """Correlation analysis tool.
Complexity: O(n²). Typical runtime: 2s for n=1000.
Output: correlation matrix + p-values.
Best for: identifying linear relationships in numerical data."""
Use tree_count for adaptive strategies:
# In DecisionPrompt docstring or instruction:
"""If tree_count shows X/Y where X > Y*0.8, prioritize low-cost actions or termination."""
Leverage unavailable_actions.available_at for conditional logic:
# In tool's is_tool_available() docstring:
"""Available after: 1) data retrieved, 2) schema validated, 3) min 100 rows present."""
# → LLM sees explicit prerequisites for quantitative workflows
Output fields for explainability:
function_name: str = dspy.OutputField(
description="Select exactly one function name from available_actions..."
)
function_inputs: dict[str, Any] = dspy.OutputField(
description="Inputs for selected function. Must match available_actions[function_name]['inputs']."
)
end_actions: bool = dspy.OutputField(
description="Has end_goal been achieved? Set True to terminate after this action."
)
Caveat: DecisionPrompt has no built-in reasoning or confidence output by default. To add explainability:
# Modify ElysiaChainOfThought initialization:
decision_module = ElysiaChainOfThought(
DecisionPrompt,
tree_data=tree_data,
reasoning=True, # ← Adds reasoning: str output field
...
)
# Now output.reasoning contains step-by-step justification
ElysiaChainOfThought for token efficiencyLocation: elysia/util/elysia_chain_of_thought.py
Optional context inputs (turn on deliberately):
ElysiaChainOfThought(
signature=DecisionPrompt,
tree_data=tree_data,
environment=True, # ← Retrieved objects from prior actions
collection_schemas=True, # ← DB schemas (EXPENSIVE)
tasks_completed=True, # ← Chain-of-thought history
reasoning=True, # ← Step-by-step output
message_update=True, # ← User-facing status
)
Token control strategies:
# Narrow schemas to specific collections:
ElysiaChainOfThought(..., collection_schemas=True, collection_names=["trades", "prices"])
# Implementation (elysia_chain_of_thought.py L328-338):
if self.collection_schemas:
if self.collection_names != []:
kwargs["collection_schemas"] = self.tree_data.output_collection_metadata(
collection_names=self.collection_names, with_mappings=False
)
else:
kwargs["collection_schemas"] = self.tree_data.output_collection_metadata(with_mappings=False)
Quantitative domain example:
# For financial analysis: only inject schemas when choosing data tools
if current_branch == "data_retrieval":
decision_module = ElysiaChainOfThought(..., collection_schemas=True, collection_names=["market_data"])
else:
decision_module = ElysiaChainOfThought(..., collection_schemas=False) # Save tokens
Location: elysia/tree/tree.py, Tree._get_available_tools()
async def _get_available_tools(self, current_decision_node, client_manager):
available_tools = []
unavailable_tools = []
for tool in current_decision_node.options.keys():
if "is_tool_available" in dir(self.tools[tool]) and await self.tools[tool].is_tool_available(
tree_data=self.tree_data, base_lm=self.base_lm, complex_lm=self.complex_lm, client_manager=client_manager
):
available_tools.append(tool)
else:
is_tool_available_doc = self.tools[tool].is_tool_available.__doc__.strip() if ... else ""
unavailable_tools.append((tool, is_tool_available_doc))
return available_tools, unavailable_tools
Quantitative gating example:
class RegressionTool(Tool):
async def is_tool_available(self, tree_data, **kwargs):
"""Available when: min 30 data points, 2+ numerical columns, no missing values > 10%."""
if "data" not in tree_data.environment.environment:
return False
data = tree_data.environment.environment["data"]
return len(data.objects) >= 30 and data.metadata.get("numerical_cols", 0) >= 2
Caveat: The docstring is surfaced to the LLM as unavailable_actions[tool]["available_at"]. Make it actionable:
"Not available yet""Available after retrieving ≥30 rows with ≥2 numerical columns"LabeledFewShotLocation: elysia/util/elysia_chain_of_thought.py, aforward_with_feedback_examples()
examples, uuids = await retrieve_feedback(client_manager, self.tree_data.user_prompt, feedback_model, n=10)
if len(examples) > 0:
optimizer = dspy.LabeledFewShot(k=10) # ← Fixed k=10
optimized_module = optimizer.compile(self, trainset=examples)
else:
return await self.aforward(lm=complex_lm, **kwargs) # ← Fallback: no examples → complex LM
# LM selection by example count:
if len(examples) < num_base_lm_examples: # default 3
return await optimized_module.aforward(lm=complex_lm, **kwargs)
else:
return await optimized_module.aforward(lm=base_lm, **kwargs)
Caveats:
k=10 is hardcoded; modify source to tune.num_base_lm_examples=3 threshold: <3 examples → use complex LM (more capable), ≥3 → use base LM (faster/cheaper).Quantitative tuning:
num_base_lm_examples=10 to always use complex LM.num_base_lm_examples=1 to prefer base LM.Location: elysia/util/retrieve_feedback.py
if not await client.collections.exists("ELYSIA_FEEDBACK__"):
return [], [] # ← No collection → no examples
superpositive = await feedback_collection.query.near_text(
query=user_prompt, filters=Filter(..., feedback==2.0), certainty=0.7, limit=n
)
if len(superpositive.objects) < n:
positive = await feedback_collection.query.near_text(
query=user_prompt, filters=Filter(..., feedback==1.0), certainty=0.7, limit=n
)
feedback_objects = superpositive.objects + positive.objects[:(n - len(superpositive.objects))]
random.shuffle(relevant_updates) # ← Introduces variability
relevant_updates = relevant_updates[:n]
Caveats:
ELYSIA_FEEDBACK__ collection in Weaviate.certainty=0.7 threshold: lower → more examples (noisier), higher → fewer examples (stricter).Quantitative optimization:
random.shuffle() or seed it.Filter.by_property("domain").equal("finance")).Location: elysia/tree/util.py, AssertedModule
class AssertedModule(dspy.Module):
def __init__(self, module, assertion: Callable, max_tries: int = 3):
self.assertion = assertion # (kwargs, pred) → (bool, feedback_str)
self.max_tries = max_tries
async def aforward(self, **kwargs):
pred = await self.module.acall(**kwargs)
num_tries = 0
asserted, feedback = self.assertion(kwargs, pred)
while not asserted and num_tries <= self.max_tries:
asserted_module = self.modify_signature_on_feedback(pred, feedback)
pred = await asserted_module.aforward(
previous_feedbacks=self.previous_feedbacks,
previous_attempts=self.previous_attempts,
**kwargs
)
asserted, feedback = self.assertion(kwargs, pred)
num_tries += 1
return pred
Quantitative customization:
# Example: Strict assertion for tool selection
def _tool_assertion(kwargs, pred):
valid = pred.function_name in self.options
feedback = f"Must choose from {list(self.options.keys())}" if not valid else ""
return valid, feedback
decision_executor = AssertedModule(decision_module, assertion=_tool_assertion, max_tries=2)
Caveat: Keep max_tries low (1-3) to avoid token/cost explosion. Each retry adds previous attempts to context.
_get_successive_actions output shallow to minimize tokens.add_branch(instruction=...) to inject quantitative guidance (costs, priorities, constraints)DecisionPrompt.instructiontree_count for adaptive strategies (e.g., "if X > 0.8*Y, prefer termination")is_tool_available() docstrings (surfaced as available_at)reasoning=True for explainabilitycollection_schemas only when needed; constrain via collection_namestasks_completed=True to avoid repeats (adds tokens but prevents loops)environment=True for stateless decisionsELYSIA_FEEDBACK__ exists before enabling USE_FEEDBACKnum_base_lm_examples threshold (default 3) for cost/quality tradeoffk=10 in source if domain needs more/fewer examplesmax_tries conservatively (1-3) in AssertedModulerandom.shuffle() in retrieve_feedback.py for determinismTree.log_token_usage() to measure impact of schemas/reasoningtree_count to detect loopsprevious_errors to identify systematic failuresThe following diagram illustrates Elysia's MCTS implementation with integrated DSPy components and feedback mechanisms:

Note: View the mermaid diagram source or PNG version
dspy.Module: Base class for all reasoning modules (ElysiaChainOfThought)dspy.Signature: Defines input/output structure (DecisionPrompt)dspy.LabeledFewShot: Optimizer that compiles modules with labeled examplesdspy.Prediction: Structured LLM output predictionsPrimary Module: elysia.tree.util.DecisionNode
Full Path: /elysia/tree/util.py (lines 218-502)
The DecisionNode class implements the selection phase of MCTS, choosing the most promising action from available options.
Key Components:
ElysiaChainOfThought with DecisionPrompt signatureelysia.tree.prompt_templates.DecisionPrompt (/elysia/tree/prompt_templates.py lines 5-130)__call__ method evaluates available tools and makes decisions# Core decision-making process
decision_module = ElysiaChainOfThought(
DecisionPrompt,
tree_data=tree_data,
environment=True,
collection_schemas=self.use_elysia_collections,
tasks_completed=True,
message_update=True,
reasoning=tree_data.settings.BASE_USE_REASONING,
)
decision_executor = AssertedModule(
decision_module,
assertion=self._tool_assertion,
max_tries=2,
)
DSPy Module: ElysiaChainOfThought extends dspy.Module
Full Path: /elysia/util/elysia_chain_of_thought.py (lines 24-421)
Primary Module: elysia.tree.tree.Tree._get_successive_actions()
Full Path: /elysia/tree/tree.py (method within Tree class)
The exploration mechanism evaluates future possible actions to inform current decisions.
Key Components:
_get_available_tools() method determines which actions are currently possibleis_tool_available() and run_if_true() methods# Exploration through successive actions
successive_actions = self._get_successive_actions(
successive_actions={},
current_options=init_options,
)
# Decision considers future possibilities
available_actions: list[dict] = dspy.InputField(
description="List of possible actions to choose from for this task only"
)
successive_actions: str = dspy.InputField(
description="Actions that stem from actions you can choose from"
)
Primary Module: elysia.tree.util.AssertedModule
Full Path: /elysia/tree/util.py (lines 153-215)
The evaluation phase ensures decisions meet quality criteria through assertion-based feedback loops.
Key Components:
class AssertedModule(dspy.Module):
"""
A module that calls another module until it passes an assertion function.
This function returns a tuple of (asserted, feedback).
If the assertion is false, the module is called again with the previous feedbacks and attempts.
"""
def __init__(
self,
module: ElysiaChainOfThought,
assertion: Callable[[dict, dspy.Prediction], tuple[bool, str]],
max_tries: int = 3,
):
self.assertion = assertion
self.module = module
self.max_tries = max_tries
self.previous_feedbacks = []
self.previous_attempts = []
DSPy Integration: Uses dspy.Module base class with custom assertion logic
Primary Module: elysia.tree.util.CopiedModule
Full Path: /elysia/tree/util.py (lines 77-152)
The back propagation mechanism updates the system based on previous decisions and their outcomes.
Key Components:
Module: elysia.util.objects.TrainingUpdate
Purpose: Stores decision outcomes for learning
results = [
TrainingUpdate(
module_name="decision",
inputs=tree_data.to_json(),
outputs={k: v for k, v in output.__dict__["_store"].items()},
),
Status(str(self.options[output.function_name]["status"])),
]
Module: elysia.util.retrieve_feedback.retrieve_feedback
Full Path: /elysia/util/retrieve_feedback.py (lines 8-92)
Retrieves similar examples from the feedback database for few-shot learning:
async def retrieve_feedback(
client_manager: ClientManager,
user_prompt: str,
model: str,
n: int = 6
) -> tuple[list[dspy.Example], list[str]]:
"""
Retrieve similar examples from the database.
"""
Module: elysia.tree.util.CopiedModule
Integrates previous failed attempts into new decision attempts:
class CopiedModule(dspy.Module):
"""
A module that copies another module and adds a previous_feedbacks field to the signature.
This is used to store the previous errored decision attempts for the decision node.
"""
def __init__(self, module: ElysiaChainOfThought, **kwargs):
feedback_desc = (
"Pairs of INCORRECT previous attempts at this action, and the feedback received for each attempt. "
"Judge what was incorrect in the previous attempts. "
"Follow the feedback to improve your next attempt."
)
Module: elysia.util.elysia_chain_of_thought.ElysiaChainOfThought.aforward_with_feedback_examples
Full Path: /elysia/util/elysia_chain_of_thought.py (lines 345-421)
async def aforward_with_feedback_examples(
self,
feedback_model: str,
client_manager: ClientManager,
base_lm: dspy.LM,
complex_lm: dspy.LM,
num_base_lm_examples: int = 3,
return_example_uuids: bool = False,
**kwargs,
) -> tuple[dspy.Prediction, list[str]] | dspy.Prediction:
"""
Use the forward pass of the module with feedback examples.
This will first retrieve examples from the feedback collection,
and use those as few-shot examples to run the module.
"""
examples, uuids = await retrieve_feedback(
client_manager, self.tree_data.user_prompt, feedback_model, n=10
)
if len(examples) > 0:
optimizer = dspy.LabeledFewShot(k=10)
optimized_module = optimizer.compile(self, trainset=examples)
Elysia leverages DSPy's sophisticated optimization framework to continuously improve its MCTS decision-making capabilities. The optimization process operates at multiple levels, from individual decision nodes to the entire reasoning pipeline.
The optimization in Elysia follows a hierarchical few-shot learning approach where:
Module: dspy.LabeledFewShot
Purpose: Implements few-shot learning by selecting the most relevant examples from a training set
Key Features:
k examples (default k=10 in Elysia)sample=True, randomly selects k examples from available training setsample=False, takes first k examples from training set# LabeledFewShot Implementation in Elysia
optimizer = dspy.LabeledFewShot(k=10)
optimized_module = optimizer.compile(
student=self, # The ElysiaChainOfThought module to optimize
trainset=examples, # Retrieved feedback examples
sample=True # Use random sampling for example selection
)
Optimization Process:
retrieve_feedback() fetches similar examples from WeaviateElysia employs different optimization strategies for different model types:
# Base LM optimization (faster, simpler reasoning)
base_optimizer = dspy.LabeledFewShot(k=3)
base_optimized = base_optimizer.compile(module, trainset=base_examples)
# Complex LM optimization (deeper reasoning)
complex_optimizer = dspy.LabeledFewShot(k=10)
complex_optimized = complex_optimizer.compile(module, trainset=complex_examples)
The system retrieves examples based on:
# Adaptive k-value based on available examples
k_value = min(10, len(examples)) if len(examples) > 0 else 0
optimizer = dspy.LabeledFewShot(k=k_value)
# Context-aware example selection
if task_complexity == "high":
optimizer = dspy.LabeledFewShot(k=15) # More examples for complex tasks
else:
optimizer = dspy.LabeledFewShot(k=5) # Fewer examples for simple tasks
Primary Limitation: LabeledFewShot relies on a fixed sample size (k=10), which creates several learning and generalization challenges:
When sample=True (default in Elysia), the optimizer randomly selects examples, which can lead to:
The current implementation uses static optimization that doesn't adapt based on:
The fixed sample size approach has several generalization limitations:
# Current approach - fixed k=10
optimizer = dspy.LabeledFewShot(k=10) # Always uses exactly 10 examples
# Problems:
# 1. May not be enough for complex multi-step reasoning
# 2. May be too many for simple binary decisions
# 3. No adaptation based on example quality or relevance
# 4. No consideration of example diversity or coverage
# Proposed adaptive approach
def adaptive_k_selection(examples, task_complexity, context_diversity):
base_k = 5
complexity_multiplier = {"simple": 1, "medium": 2, "complex": 3}
diversity_bonus = min(5, context_diversity * 2)
return min(20, base_k * complexity_multiplier[task_complexity] + diversity_bonus)
k_value = adaptive_k_selection(examples, task_complexity, context_diversity)
optimizer = dspy.LabeledFewShot(k=k_value)
# Proposed quality-aware selection
def select_quality_examples(examples, k, quality_threshold=0.8):
# Filter by quality score
high_quality = [ex for ex in examples if ex.quality_score >= quality_threshold]
# If not enough high-quality examples, include medium quality
if len(high_quality) < k:
medium_quality = [ex for ex in examples if 0.6 <= ex.quality_score < quality_threshold]
selected = high_quality + medium_quality[:k-len(high_quality)]
else:
selected = high_quality[:k]
return selected
# Proposed dynamic optimization
class AdaptiveLabeledFewShot:
def __init__(self, min_k=3, max_k=20):
self.min_k = min_k
self.max_k = max_k
self.performance_history = []
def compile(self, student, trainset, context_metadata=None):
# Determine optimal k based on context and history
optimal_k = self._determine_optimal_k(trainset, context_metadata)
# Use quality-based selection
selected_examples = self._select_quality_examples(trainset, optimal_k)
# Compile with selected examples
return dspy.LabeledFewShot(k=optimal_k).compile(student, trainset=selected_examples)
The optimization process is deeply integrated into Elysia's MCTS workflow:
This sophisticated optimization framework enables Elysia to continuously improve its decision-making capabilities while maintaining the systematic exploration and exploitation balance characteristic of MCTS algorithms.
While Elysia currently uses LabeledFewShot, several other DSPy optimizers could provide enhanced learning capabilities:
Module: dspy.BootstrapFewShot
Purpose: Generates synthetic examples by running the module on unlabeled inputs and using high-confidence outputs as training examples
Potential in Elysia: Could generate synthetic decision examples from historical tree states, expanding the training set beyond human-annotated feedback
Module: dspy.MIPROv2
Purpose: Multi-prompt optimization that generates and optimizes multiple prompt variations
Potential in Elysia: Could optimize different prompt templates for different decision contexts (e.g., tool selection vs. parameter optimization)
Module: dspy.COPRO
Purpose: Coordinate ascent optimization for prompt engineering
Potential in Elysia: Could optimize the decision prompts used in DecisionPrompt signature for better reasoning quality
Module: dspy.BootstrapFinetune
Purpose: Combines few-shot learning with model fine-tuning
Potential in Elysia: Could fine-tune specialized models for different types of decisions (e.g., tool selection vs. parameter tuning)
Module: dspy.GEPA
Purpose: Evolutionary optimization of prompts using genetic algorithms
Potential in Elysia: Could evolve decision-making prompts over time, adapting to changing user patterns and task requirements
| Optimizer | Learning Type | Sample Efficiency | Generalization | Computational Cost | Best Use Case in Elysia |
|---|---|---|---|---|---|
| LabeledFewShot | Few-shot | High | Limited | Low | Current implementation, simple decisions |
| BootstrapFewShot | Self-supervised | Medium | Good | Medium | Generating synthetic examples |
| MIPROv2 | Multi-prompt | High | Excellent | High | Complex decision contexts |
| COPRO | Coordinate ascent | Medium | Good | Medium | Prompt optimization |
| BootstrapFinetune | Fine-tuning | Low | Excellent | Very High | Specialized decision models |
| GEPA | Evolutionary | Low | Excellent | High | Long-term adaptation |
Location: elysia.tree.tree.Tree.__init__
Location: elysia.tree.tree.Tree.async_run (/elysia/tree/tree.py lines 1431-1550)
while True:
# Selection: Get available tools
available_tools, unavailable_tools = await self._get_available_tools(
current_decision_node, client_manager
)
# Exploration: Get successive actions
successive_actions = self._get_successive_actions(
successive_actions={},
current_options=init_options,
)
# Decision: Make choice using MCTS-like reasoning
decision, results = await current_decision_node(
tree_data=self.tree_data,
base_lm=self.base_lm,
complex_lm=self.complex_lm,
available_tools=available_tools,
unavailable_tools=unavailable_tools,
successive_actions=successive_actions,
client_manager=client_manager,
)
# Evaluation & Back Propagation: Store results for learning
self.training_updates.extend(results)
Location: Throughout the decision process
Elysia extensively uses the DSPy framework for structured LLM interactions:
# Create decision node
decision_node = DecisionNode(
id="root",
options=available_tools,
instruction="Choose the best tool for the task"
)
# Make decision with MCTS-like reasoning
decision, results = await decision_node(
tree_data=tree_data,
base_lm=base_lm,
complex_lm=complex_lm,
available_tools=tools,
unavailable_tools=[],
successive_actions=future_actions,
client_manager=client_manager,
)
# Enable feedback in settings
tree_data.settings.USE_FEEDBACK = True
# Decision will automatically use historical examples
output, uuids = await decision_executor.aforward_with_feedback_examples(
feedback_model="decision",
client_manager=client_manager,
base_lm=base_lm,
complex_lm=complex_lm,
**decision_inputs
)
def custom_assertion(kwargs, pred):
# Custom validation logic
is_valid = pred.function_name in allowed_functions
feedback = f"Must choose from: {allowed_functions}" if not is_valid else ""
return is_valid, feedback
# Use with AssertedModule
decision_executor = AssertedModule(
decision_module,
assertion=custom_assertion,
max_tries=5,
)
Elysia's MCTS implementation provides a sophisticated framework for decision-making in complex reasoning tasks. The system combines the systematic exploration of MCTS with the contextual understanding of large language models, enhanced by continuous learning through feedback mechanisms.
The modular design allows for fine-grained control over each aspect of the reasoning process, from initial choice selection to final evaluation and learning. The extensive use of the DSPy framework ensures structured, optimizable interactions with language models throughout the process.